Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow more complex GC policies #5079

Merged
merged 2 commits into from
Sep 26, 2024
Merged

Allow more complex GC policies #5079

merged 2 commits into from
Sep 26, 2024

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented Jun 24, 2024

We've had some issues with users having difficulty understanding how the cache policy works in dagger, typically run in container environments. Additionally, I've also had issues explaining to users how to configure this in buildkit. One of the problems is with setting a reasonable default - for example, on a development machine, where buildkit has access to the full disk, setting 10% max disk is totally reasonable. But in docker desktop, where you might expect to run a lot of builds, that same 10% max disk is completely inappropriate and far too small. In GitHub actions, we'd also want to use much more of the available space.

Actually, a useful setting would be to be able to configure at how much free disk space the GC should kick in, rather than the consumed disk space. For example, I might prefer to configure the GC to kick in when there is less that 20% total disk space free. This new metric also takes into account other applications that might consume disk space as well (now I can configure buildkit without needing to be aware of what else is going to be taking up lots of space), and make the most of the space on the machine.

This PR replaces the existing keepBytes settings with 3 new toggles:

  • maxStorage - how much disk this policy is allowed to consume
  • minStorage - how much disk this policy is always allowed to consume (even when constrained by free)
  • free - how much disk should be left free

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, but which setting should take priority? I guess if I have keepBytes set to 20GB but only 10GB free space then it should still try to use up to 20 and fail if running out of space. Otherwise, if you accidentally run low on space because of some other tool, buildkit will just start to perform very poorly but there is no indication of why that is.

@jedevc
Copy link
Member Author

jedevc commented Jun 25, 2024

IMO, I think both settings should be followed - I think it would be quite useful and powerful to allow pruning at both constraints.

e.g. I might want to have keepBytes at 50GB, to consume max 50GB of cache - but if disk space is low, then I'd want it to use less, and always leave 10GB left.

Ideally, if we took this approach, we could modify the defaults for storage, and have freeBytes set to 1GB by default - just to prevent the accidental scenario where buildkit may consume all available space.

I'd argue that the performance metrics should be surfaced around progress/etc, docker desktop, dagger cloud, etc - this kind of poor performance already happens today on tooling like docker desktop when (naively) starting a docker container driver manually - the default disk space is small, and the 10% default is not great for large builds.

util/disk/disk_unix.go Outdated Show resolved Hide resolved
@tonistiigi
Copy link
Member

e.g. I might want to have keepBytes at 50GB, to consume max 50GB of cache - but if disk space is low, then I'd want it to use less, and always leave 10GB left.

I guess then we need min-storage, max-storage and free. We can keep the existing storage limit as one but need to reevaluate the defaults.

@jedevc
Copy link
Member Author

jedevc commented Jun 26, 2024

What would min-storage be in the above case? Is that the minimum amount of storage that the gc policy can consume (that doesn't make sense) 🤔

I think we would just need max-storage and free.

@tonistiigi
Copy link
Member

tonistiigi commented Jun 26, 2024

min-storage - amount BuildKit is always allowed to consume, even if free space constraint would say it should start deleting things. To make sure that BuildKit has some reserved storage that it can use for cache even if other apps are misbehaving.
max-storage - BuildKit will not go over this amount even if there is still plenty of free space on disk.

@jedevc
Copy link
Member Author

jedevc commented Sep 13, 2024

Refactored into minStorage/maxStorage/free as suggested - maxStorage is equivalent to the old keepBytes (because of this, we don't need to introduce caps at the control proto layer, which I don't think even exist?)

You can configure all of them in each policy, which I've done - I've also added some tests for manually verifying what happens when everything gets set together.

@jedevc jedevc marked this pull request as ready for review September 13, 2024 15:13
@jedevc
Copy link
Member Author

jedevc commented Sep 19, 2024

@tonistiigi @crazy-max any thoughts on this new approach?

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

api/types/worker.proto Show resolved Hide resolved
cmd/buildkitd/config/config.go Show resolved Hide resolved

// if we need to free up space, then decrease to that
if excess := opt.Free - dstat.Free; excess > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in Windows it looks like dstat can be empty. In that case we should not use the values from it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that's a typo - dstat should never be empty, the error should be propagated like in the original code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we just logged errors from that function (not sure if #5079 (comment) has been addressed yet). I'm not sure how likely this is but can we make it so that prune only errors if GetDiskStat fails and Free is set. For other deletions I think it could just continue even if the GetDiskStat doesn't give good result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, I've adjusted to do this now.

@tonistiigi
Copy link
Member

Btw, not releated to this PR (can fix it in separate) but noticed

 sandbox.go:138: time="2024-09-19T18:30:10Z" level=warning msg="failed to get disk size: no such file or directory"
    sandbox.go:138: time="2024-09-19T18:30:10Z" level=warning msg="failed to get disk size: no such file or directory"

in CI logs. Looks like if root directory does not exist then we try (and fail) to get disk info before we actually create the directory.

@jedevc jedevc force-pushed the free-bytes-gc branch 2 times, most recently from 88e7190 to 99849fa Compare September 24, 2024 13:59
We shouldn't use the cachemount root, we should actually properly
use the worker's specified root which is propagated from the config.

Signed-off-by: Justin Chadwell <[email protected]>
@jedevc jedevc changed the title Add a new freeBytes gc policy Allow more complex GC policies Sep 25, 2024
@tonistiigi tonistiigi merged commit b82235c into moby:master Sep 26, 2024
91 checks passed
@thompson-shaun thompson-shaun added this to the v0.17.0 milestone Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants